Information Processing from Document Images

نویسنده

  • M. N. S. S. K. Pavan
چکیده

Analysis of document images for information extraction has become very prominent in recent past. Wide variety of information, which has been conventionally stored on paper is now being converted into electronic form for better storage and intelligent processing. This needs processing of documents using image analysis algorithms. Document image analysis differs from the conventional image processing in its format and the information content. Document images are usually rich in formally presented information. The subjectiveness associated with the natural image analysis is not therefore present in the document images. Information in these images is more structured, and presented in a natural language with the help of a grammar and a script. Consider an image of a document page as shown in Figure 1(a). This contains text blocks and images. Text blocks can be paragraphs of text in various fonts and sizes, titles or captions. Extracting information from the image (graphics block) is difficult compared to that from the text block. A text block can be converted to an editable text, if the constituent script, font, character etc. can be recognised. This recognised text can provide useful information about the graphics block. This can be of immense help in situations where one searches for information from a large database of document images. In this information rich modern era, one often comes across such situations where the search results are needed at the finger tips. There are two basic issues associated with this : (a) to represent the bulky raw-data in the compact and interactive form (b) to retrieve relevant information from the database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

رفع اعوجاج هندسی متون به‌کمک اطلاعات هندسی خطوط متن

Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

پژوهشی کیفی در تحلیل الگوی بهره‌گیری خبرگان حوزه‌ی سلامت از تصاویر پزشکی

Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...

متن کامل

Super-resolution of Defocus Blurred Images

Super-resolution is a process that combines information from some low-resolution images in order to produce an image with higher resolution. In most of the previous related work, the blurriness that is associated with low resolution images is assumed to be due to the integral effect of the acquisition device’s image sensor. However, in practice there are other sources of blurriness as well, inc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003